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Binaural  Processing  of  Multiple  Sound  Sources 
Final  Progress  Report  (7/13/2012-7/14/2015) 

The  AFOSR  Grant,  FA9550-12-1-0312,  has  supported  the  Spatial  Hearing  Laboratory 
(SHL)  at  Arizona  State  University  for  the  past  four  years.  The  research  conducted  in  the 
SHL  has  involved  four  main  topics:  Sound  Source  Localization  by  Cochlear  Implant  (Cl) 
Patients,  Single  Sound  Source  Localization  Accuracy,  Multiple  Sound  Source 
Localization  Identification,  and  Sound  Source  Localization  When  Listeners  Move.  The 
Cl  research  was  also  supported  by  an  NIH  grant  (“Cochlear  Implant  Performance  in 
Realistic  Listening  Environments,”  Dr.  Michael  Dorman,  Principal  Investigator,  Dr. 
William  Yost  unpaid  advisor).  The  SHL  was  also  used  in  2014  to  support  a  small  short¬ 
term  research  project  funded  by  a  contract  from  the  Boeing  Corporation  awarded  to  Dr. 
Yost.  This  project  involved  collecting  data  for  Boeing  Corporation  (there  were  no 
publications  or  presentations  of  these  data)  on  the  sound  source  localization  of  very  low 
frequency  (<60  Hz)  sounds.  The  other  three  topics  cited  above  are  entirely  within  the 
scope  of  the  AFOSR  grant. 

Sound  Source  Localization  by  Cochlear  Implant  Patients 

Many  experiments  have  been  conducted  using  a  methodology  developed  by  Dr.  Yost  to 
efficiently  measure  sound  source  localization  accuracy  in  the  front  azimuth  plane  at  pinna 
height.  Baseline  sound  source  localization  accuracy  data  from  48  normal  hearing  listeners 
were  obtained  and  published  (1).  Then  identical  measures  were  obtained  from  a  wide 
variety  of  Cl  patients  including  patients  with  a  Cl  for  each  ear  (2,  6,  12,  14,  35,  36,  41, 
44),  a  Cl  at  one  ear  and  a  hearing  aid  at  the  other  ear  (2,  32,  34,  35,  37,  38,  39,  40,  41,  42, 
44),  and  a  Cl  at  one  ear  and  unaided  normal  (or  near  normal)  hearing  at  the  other  ear  (10). 
In  each  study  the  basic  measure  was  a  comparison  of  sound  source  localization  accuracy 
perfonnance  for  a  normal  hearing  control  group  as  compared  to  that  for  patients  in  the 
various  Cl  groups.  There  were  a  wide  variety  of  findings  that  are  informative  about  both 
normal  hearing  sound  source  localization  and  that  achievable  by  Cl  users.  In  all  cases  Cl 
users’  perfonnance  indicated  poorer  sound  source  localization  accuracy  than  the  nonnal 
controls,  and  in  some  cases  Cl  patients  were  unable  to  localize  sounds  above  a  chance 
level  of  performance.  When  Cl  patients  in  the  different  groups  could  localize  the  source 
of  sound,  they  did  so  mainly  when  sounds  contained  high-frequency  information  (sounds 
either  had  a  bandwidth  of  125-8000  Hz  and/or  2000  to  8000  Hz).  Most,  but  not  all,  Cl 
patients  performed  very  poorly,  if  at  all,  in  the  sound  source  localization  task  when  the 
sound  had  a  125-500  Hz  bandwidth.  These  results  suggest  that  Cl  patients  who  are 
provided  information  to  both  ears  can  localize  sound  sources  when  the  probable  cue  is  the 
interaural  level  difference  (ILD  cues  are  used  to  localize  high-frequency  sounds).  ILD 
cues  are  available  via  a  Cl  to  these  patients,  but  due  to  the  way  in  which  the  cochlear 
implant  operates  interaural  time  difference  (ITD)  cues,  which  provide  location 
information  at  low  frequencies,  are  not  available.  These  results  are  helping  inform  Cl 
development  and  use  so  that  CIs  may  provide  better  spatial  infonnation  in  the  future. 

And,  the  results  clearly  document  that  Cl  users  who  receive  acoustic  input  to  both  ears 
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can,  in  most  cases,  localize  sound  sources  that  they  cannot  do  with  a  Cl  fit  to  only  one 
ear. 

Separate  Sound  Source  Localization  Accuracy 

A  series  of  studies  using  sound  source  localization  identification  was  conducted  both  to 
collect  baseline  data  for  studies  of  multiple  sound  source  localization  and  because  there 
are  few  comprehensive  studies  of  sound  source  localization  identification  in  a  near  open 
field  in  the  front  azimuth  plane  (conditions  in  which  sound  source  localization  is 
primarily,  if  not  entirely,  based  on  binaural  processing). 

The  first  study  (5)  used  nonnal  hearing  48  subjects  in  a  sound  source  localization 
identification  task.  The  stimuli  were  filtered  200-ms,  noise  bursts:  125-500  Hz  (LF:  Low 
Frequency  condition),  2000-8000  Hz  (HF:  High  Frequency  condition),  and  125-8000  Hz 
(BB:  Broad  Band  condition).  Sound  source  localization  is  most  likely  based  on  interaural 
time  difference  (ITD)  processing  in  the  LF  condition,  most  likely  due  to  interaural  level 
differences  (ILD)  in  the  HF  condition,  and  most  likely  due  to  both  ITD  and  ILD 
processing  in  the  BB  condition.  The  results  from  the  “large  n  (48  subjects)”  study 
indicated  that  sound  source  localization  accuracy  in  the  identification  task  when 
expressed  as  mean  root-mean-square  (rms)  error  was  6.2°  independent  of  the  type  of 
filtering  used.  That  is,  for  these  two  or  more  octave  wide  noise  stimuli  sound  source 
localization  accuracy  is  not  different  when  ITD  processing,  or  ILD  processing,  or  both 
ITD  and  ILD  processing  are  used. 

In  a  large  scale  follow-up  study  (8)  sound  source  localization  accuracy  was  measured  as  a 
function  of  the  bandwidth  and  center  frequency  (CF)  of  the  bandpass  filters  used  to 
process  the  200-ms  noise  bursts.  Bandwidths  from  1/20  of  an  octave  to  two  octaves 
(along  with  tonal  stimuli)  were  used  and  the  CFs  of  the  filters  (or  the  tonal  frequencies) 
were  either  250  Hz  (the  spectral  region  where  ITD  processing  most  likely  occurs),  4000 
Hz  (the  spectral  region  where  ILD  processing  most  likely  occurs),  and  2000  Hz  (the 
spectral  region  where  neither  ITD  nor  ILD  cues  provide  good  information  about  spatial 
location  of  sound  sources).  A  broadband  noise  (125-8000  Hz)  was  also  used  in  which  it 
is  assumed  that  listeners  can  use  both  ITD  and  ILD  cues  for  sound  source  localization. 

The  results  showed  that  when  the  band  width  of  the  noise  was  broader  than  one  octave, 
sound  source  localization  accuracy  as  measured  by  rms  error  in  degrees  did  not  vary  as  a 
function  of  CF;  and  rms  error  was  smallest  for  these  broadband  noise  stimuli  (i.e.,  sound 
source  localization  accuracy  was  best  and  the  rms  error  did  not  decrease  for  bandwidths 
greater  than  one  octave).  As  the  bandwidth  of  the  noise  decreased  from  one  octave  to 
l/20th  of  an  octave  rms  error  increased,  and  the  amount  of  the  increase  was  CF  dependent 
such  that  best  performance  always  occurred  for  the  CF=250-Hz  condition,  worse 
performance  for  the  CF=2000-Hz  condition,  and  immediate  accuracy  performance  for  the 
CF=4000-Hz  condition. 

This  study  was  conducted  for  a  200-ms  noise  burst  presented  at  65  dBA.  The  literature  on 
spatial  hearing  using  headphone  delivered  stimuli  show  that  sound  duration  effects 
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interaural  time  discrimination  thresholds.  By  implication  that  would  suggest  that  duration 
would  affect  sound  source  localization  accuracy  in  an  open  field.  There  a  few  data  in  the 
literature  related  to  sound  source  localization  accuracy  and  duration  when  sounds  are 
presented  in  an  open  field.  Since  there  are  also  not  clear  data  in  the  literature  on  the  effect 
of  over  all  level  on  sound  source  localization  accuracy,  we  (14)  decided  to  measure  sound 
source  localization  accuracy  as  a  function  of  noise  duration  and  overall  level.  We  did  so 
for  two-octave  and  1/1 0th  octave  wide  noises  at  CFs  of  250  Hz  and  4000  Hz. 

The  results  (14)  were  that  sound  source  localization  accuracy  was  poorer  for  the  1/1 0th 
octave  than  for  the  two-octave  wide  noise,  as  we  shown  previously  (8).  Accuracy  did  not 
change  for  the  two-octave  wide  noise  for  the  two  CFs,  but  for  the  1/1 0th  octave  wide 
noise,  accuracy  was  lower  for  4000-Hz  CF.  All  of  these  results  are  the  same  as  for  the 
bandwidth/CF  sound  source  localization  accuracy  study  described  above.  In  NONE  of  the 
conditions  did  accuracy  vary  as  a  function  of  overall  sound  level  (over  the  range  of  25  to 
85  dB  dBA)  and  duration  (over  the  range  of  25  ms  to  450  ms).  Thus,  in  the  open  field, 
unlike  under  headphone  conditions,  sound  source  localization  processing  does  not  appear 
to  depend  on  overall  sound  level  or  duration.  We  are  not  sure  why  the  open  field  results 
differ  from  the  results  obtained  over  headphones  when  duration  is  varied,  but  at  least  one 
other  study  (46)  also  showed  that  sound  source  localization  accuracy  in  an  open  field 
does  not  depend  on  noise  duration. 

It  is  also  the  case  that  for  headphone  delivered  stimuli,  ITD  discrimination  thresholds  and 
the  position  of  lateralized  images  vary  as  a  function  of  the  envelope  of  the  sound  in  high- 
frequency  regions  where  ITD  processing  would  not  occur  due  to  the  temporal  fine 
structure  of  the  stimuli.  These  lateralization  results  strongly  suggest  that  ITD  processing 
can  occur  based  on  the  envelope  of  the  sound,  as  along  as  the  envelope  fluctuations  are 
slower  than  approximately  300  Hz.  It  is  also  the  case  that  ITD  processing  based  on 
envelope  ITDs  is  worse  than  that  based  on  temporal  fine  structure  cues.  Almost  no 
studies  have  investigated  envelope  ITD  processing  in  the  open  field,  and  the  few  studies 
that  have  (4,  47)  have  not  found  evidence  for  envelope  ITD  processing.  Thus,  we 
conducted  a  study  (44)  investigating  the  effect  of  envelope  on  sound  source  localization 
accuracy  in  the  open  field. 

The  study  used  sinusoidal  amplitude  modulation  (SAM)  and  transposed  stimulus 
amplitude  modulation  (TSAM)  of  filtered  noise  stimuli  and  a  4000-Hz  tone  (a  stimulus 
often  used  in  headphone  lateralization  studies).  The  study  also  investigated  filtered  and 
unfiltered  click  trains,  when  the  click  rate  and  number  of  clicks  were  varied  (stimulus 
parameters  that  affect  headphone  measurements  of  ITD  processing).  Introducing  an 
envelope  did  not  change  the  sound  source  localization  accuracy  of  any  of  the  noise 
stimuli  independent  of  the  type  of  envelope  modulation.  A  TSAM  4000-Hz  tone  had  a 
slightly  lower  rms  error  (1°)  as  compared  to  the  rms  error  for  an  un-modulated  4000-Hz 
tone.  Sound  source  localization  accuracy  for  click  stimuli  is  slightly  lower  (1-2°  rms 
error)  than  for  short  duration  (25  ms)  noise  stimuli,  but  rms  error  does  not  change  as  a 
function  of  adding  more  clicks  or  as  a  function  of  the  rate  at  which  the  multiple  clicks  are 
presented.  Thus,  unlike  changes  that  occur  over  headphones  in  lateralization  tasks 
providing  an  envelope  does  not  change,  or  barely  changes,  sound  source  localization 
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accuracy.  A  major  reason  is  that  in  the  open  field  ILD  cues  are  always  available,  whereas 
in  the  headphone  studies  ILD  differences  are  always  set  to  zero  not  allowing  for  ILD 
changes  to  be  used  as  a  basis  of  lateralization  performance.  Even  when  sound  source 
localization  accuracy  is  poor  in  the  open  field  (i.e.,  for  narrow  band  noise  stimuli  with 
high-frequency  CFs  or  for  a  4000-Hz  tone),  providing  an  envelope  to  the  narrow  band 
stimuli  usually  does  not  improve  sound  source  localization  accuracy,  and  if  it  does  the 
improvement  is  very  small.  It  is  also  the  case  that  sound  source  localization  accuracy  is 
measured  in  the  open  field  and  ITD  discrimination  is  often  measured  over  headphones. 

Thus,  this  series  of  single  sound  source  localization  studies  suggests  that  the  main 
variable  that  influences  sound  source  localization  accuracy  is  stimulus  bandwidth.  The 
results  also  show  that  for  narrow  bandwidth  stimuli  (<1  octave  wide)  best  accuracy 
occurs  for  low-frequency  sounds  (<500  Hz),  worse  perfonnance  for  mid-frequency 
sounds  (around  2000  Hz),  and  intermediate  accuracy  for  high-frequency  sounds  (>4000 
Hz).  The  duration  and  overall  level  of  sound,  as  well  as  a  sound’s  envelope  appear  to 
have  very  little  effect  on  sound  source  localization  accuracy  (at  least  in  the  front  azimuth 
hemifield).  These  results  do  not  always  occur  when  interaural  discrimination  or 
lateralization  are  measured  over  headphones.  Additional  studies  are  being  planned  to 
investigate  the  reasons  for  the  differences  between  open  field  and  lateralization  measures. 

Multiple  Sound  Source  Localization 

A  large  scale  study  was  completed  (4)  in  which  subjects  were  asked  to  determine  the 
location  of  two  simultaneously  presented  sound  sources  each  producing  a  200-ms, 
wideband,  and  independently  generated,  noise  burst.  The  main  finding  of  the  study  was 
that  listeners  can  localize  the  position  of  each  of  the  two  sources  (in  the  front  azimuth 
field),  but  not  as  well  as  they  can  localize  the  position  of  a  single  noise  burst.  Since  two 
independently  generated  wideband  noise  bursts  are  as  similar  as  two  sounds  can  be  in 
terms  of  sound  source  localization,  the  data  suggest  that  the  location  of  almost  any  two 
sounds  presented  at  the  same  time  could  probably  be  determined  (i.e.,  all  other  types  of 
sounds  would  have  greater  acoustic  differences  which  could  be  used  as  a  basis  for  sound 
source  localization). 

The  paper  suggested  a  process  by  which  the  auditory  system  might  determine  the  location 
of  two  (or  maybe  more)  simultaneously  presented  sounds.  The  process  consists  of 
dividing  the  combined  sound  wavefonn  from  two  (or  more)  sources  into  a  matrix  of 
small  time/frequency  cells.  Then  the  ITDs  and  ILDs  of  the  waveform  in  each  cell  are 
computed.  If  a  sufficient  number  of  the  cells  in  the  combined  wavefonn  matrix  have  ITD 
and  ILD  values  consistent  with  those  of  one  or  the  other  of  the  two  (or  more)  sound 
sources  when  they  are  presented  alone,  then  there  might  be  sufficient  information  in  this 
matrix  to  identify  the  two  locations.  An  amplitude  modulation  noise  task  was  used  to  test 
the  idea  of  this  approach.  The  results  suggested  that  such  a  process  might  be  used  to 
determine  the  location  of  at  least  two  sound  sources,  and  the  experiment  and  its  analysis 
suggested  that  the  temporal  width  of  the  cells  in  such  a  matrix  might  be  on  the  order  of  5 
ms  and  the  spectral  height  about  one  critical  bandwidth. 
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This  experiment  described  above  tested  one  vs.  two  spatially  separated  sound  sources. 
Another  experiment  (16)  was  conducted  in  order  to  obtain  information  about  how  many 
simultaneous  spatially  separated  sound  sources  might  be  localizable.  Both  the  accuracy  of 
determining  how  many  sources  were  presenting  sound  (numerosity)  and  listeners’  ability 
to  determine  the  location  of  these  sources  were  measured.  The  sounds  were  either  one- 
word  country  names  or  tones  of  different  frequencies.  The  results  strongly  imply  that  no 
more  than  four  simultaneous  spatially-separated  word  sources  and  no  more  than  two- 
three  simultaneously  presented  spatially-separated  tonal  sources  can  be  determined,  even 
when  as  many  as  eight  sources  produce  sound.  Two  other  very  recent  studies  (48,  49) 
have  produced  similar  results. 

In  the  studies  described  above  and  in  most  of  the  literature  on  multiple  sound  source 
localization  the  sound  sources  were  stationary.  A  study  (3 1)  was  conducted  to  determine 
if  moving  sound  sources  might  allow  for  better  segregation  of  sound  sources,  as  relative 
motion  of  visual  object  is  a  powerful  cue  for  separating  foreground  objects  from 
background  objects.  Speech  sounds  (words)  and  tones  whose  frequencies  were  either 
harmonics  of  250  Hz,  harmonics  of  250  Hz  except  for  the  second  harmonic  whose 
frequency  was  “mistuned”  to  613  Hz  (from  the  harmonic  of  500  Hz,  e.g.,  rather  than 
tones  of  250,  500,  750  and  1000  Hz  for  the  harmonic  case,  the  tones  were  250,  613,  750, 
and  1000  Hz  for  this  “miss-tuned  harmonic  case”),  and  tones  of  random  frequency  each 
within  an  octave  of  the  relative  harmonics  of  250  Hz.  Sounds  were  presented 
simultaneously  at  different  spatial  source  locations  (either  three,  four,  or  six  locations), 
with  the  spatial  locations  maximally  different  (e.g.,  for  four  tones,  at  0°,  90°,  180°,  and 
270°).  Either  all  sounds  (three,  four,  or  six)  rotated  around  the  azimuth  circle  at  the  same 
time  or  one  sound  rotated  while  the  other  sounds  remained  at  fixed  locations.  When 
either  all  words  or  one  word  rotated  listeners  could  determine  the  direction  of  rotation. 

But  listeners  could  do  so  when  all  words  rotated  by  attending  to  only  one  word  at  a  time, 
i.e.,  attending  to  the  “chorus”  of  all  of  the  words  (three,  four,  or  six)  did  not  produce  any 
motion  perception.  Listeners  could  not  determine  (i.e.,  performance  was  at  chance)  the 
direction  of  rotation  for  harmonically  related  tones,  and  performance  was  marginally 
better  than  chance  (approximately  70-75%  correct  in  determining  the  direction  of 
rotation)  for  the  miss-tuned  harmonic  condition  and  the  random  frequency  condition. 
These  results  suggest  that  perceiving  sound  source  motion  of  multiple  sounds  is  very 
dependent  on  the  perceptual  relationship  of  one  sound  to  the  other  sounds  (e.g.,  are 
sounds  harmonically  related?).  The  results  also  suggest  that  sound  source  motion  may  not 
be  a  good  cue  for  segregating  sound  sources. 

In  the  paper  (3 1)  in  which  listeners  were  asked  to  determine  the  number  of  sound  sources 
they  perceived,  making  the  sound  at  one  source  more  intense  would  most  likely  increase 
its  probability  of  being  perceived.  This  is  similar  to  a  spatial  release  from  masking  (SRM) 
study  in  which  the  threshold  for  detecting,  discriminating,  or  recognizing  a  target  sound 
in  the  presence  of  spatially  separated  masker  sound  sources  is  lower  (target  easier  to 
process)  than  if  the  target  and  maskers  are  all  co-located  at  the  same  source.  In  most 
SRM  studies  the  maskers  are  asymmetrically  located  relative  to  a  centered  target  sound 
source.  In  these  cases  the  target  may  be  processed  based  on  binaural  processing  and/or 
because  the  target-to-masker  ratio  is  higher  at  the  ear  furthest  from  the  masker  sound 
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source  (i.e.,  the  masker  is  masked  by  the  head,  i.e.,  head  shadow).  When  the  masker 
sources  are  symmetrically  spaced  around  a  target  source,  target  processing  would  have  to 
be  based  on  binaural  processing  as  both  ears  receive  the  same  target- to-masker  ratio. 

A  large-scale  study  (18)  was  conducted  to  determine  SRM  for  masker  source 
configurations  in  which  the  maskers  were  always  symmetrically  located  relative  to  a 
centered  target  sound  source  (i.e.,  when  target  processing  in  the  presence  of  the  masker 
would  involve  only  binaural  processing).  Two,  four,  or  six  maskers  (all  one-word  country 
names  uttered  by  male  talkers)  were  used  to  mask  centered  target  words  (one-word 
country  names  uttered  by  a  female  talker).  Masking  when  the  maskers  were  spatially 
separated  from  the  target  was  compared  to  conditions  when  all  words  (target  and 
maskers)  were  co-located  at  the  center  loudspeaker  (always  the  location  of  the  target 
word).  Masking  was  also  measured  when  the  maskers  were  filtered  and  modulated  noise 
bursts  and  when  the  target  and  masker  words  were  filtered  through  different  filters.  When 
maskers  are  noises,  masking  is  assumed  to  be  primarily  “energetic”  masking.  When  the 
maskers  and  target  words  are  differentially  filter  to  reduce  spectral  overlap  of  the  masker 
and  target  sounds,  masking  is  primarily  “informational”  (i.e.,  masking  is  largely  due  to 
the  similarity  of  the  masker  and  target  words).  When  the  target  and  masker  sounds  are 
both  unfiltered  words,  masking  is  assumed  to  be  a  combination  of  “energetic”  and 
“informational”  masking.  Thus,  in  addition  to  varying  the  number  of  maskers  the  type  of 
masker  was  also  varied:  unfiltered  speech,  filter  and  modulated  noise,  or  filtered  speech. 
The  target  was  always  speech,  but  when  the  masker  was  filtered  speech  the  target  was 
also  filtered  but  using  different  filters  than  those  used  to  filter  the  target  word. 

In  this  study,  SRM  (difference  in  word  recognition  between  the  co-located  and  spatially 
separated  masker  conditions)  decreased  as  the  number  of  maskers  increased  from  two  to 
six,  and  there  was  almost  no  SRM  for  the  six-masker  conditions.  The  decrease  in  SRM 
occurred  for  the  noise  maskers  (energetic  masking),  for  the  differentially  filtered  maskers 
(informational  masking),  and  for  the  unfiltered  speech  maskers  (combination  of  energetic 
and  informational  masking).  In  fact  masking  of  speech  targets  by  speech  maskers  (a 
combination  of  energetic  and  informational  masking)  was  equal  to  the  sum  of  noise 
masking  (energetic  masking)  and  the  differentially  filtered  targets  and  maskers 
(informational  masking)  for  all  conditions. 

These  data  reinforce  the  previous  work  suggesting  that  the  auditory  system  cannot 
differentially  process  more  than  about  four  simultaneous  speech  sounds  even  when  their 
sources  are  spatially  separated.  In  the  SRM  study  making  one  of  the  sounds  (the  target 
sound)  more  intense  did  not  improve  its  intelligibility  when  more  than  four  masking 
sounds  were  spatially  separated  from  the  target  sound  sources  as  compared  to  when  all 
sounds  were  co-located  at  the  same  loudspeaker. 

Thus  several  studies  lead  to  the  conclusion  that  the  auditory  scene  is  small,  probably 
limited  to  four  or  fewer  sound  sources,  when  the  sound  from  all  of  the  sources  occur  at 
about  the  same  time.  It  is  probable  that  human  listening  cannot  segregate  more  than  about 
four  sound  sources  without  the  aid  of  some  external  signal-processing  device. 
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In  1939  Wallach  proposed  that  not  only  could  head  movements  help  resolve  cone-of- 
confusion  errors,  but  head  movements  could  also  facilitate  sound  source  localization  off 
the  azimuth  plane  in  elevation.  We  (17)  conducted  experiments  based  on  signal 
processing  algorithms  to  detennine  if  an  automated  systems  could  be  developed  to  use 
head  movements  to  determine  multiple  sound  sources  in  both  azimuth  and  elevation. 
Both  a  cross-correlation  approach  and  a  Kalman  filter  application  indicated  that  head 
motion  could  be  used  to  determine  three  sound  sources  that  were  located  at  different 
elevations  and  azimuths.  Thus,  systems  that  involve  receiver  motion  might  be 
advantageous  in  sound  source  localization  tasks 

Sound  Source  Localization  When  Listeners  Move 

At  the  beginning  of  the  grant  period  the  Spatial  Hearing  Lab  was  renovated  to  add 
additional  loudspeakers  and  to  include  a  computer  controlled  rotating  chair.  As  a  result  a 
series  of  studies  was  undertaken  to  study  sound  source  localization  processing  when  a 
listener  moves  (i.e.  a  listener  is  rotated  in  the  computer  controlled  chair).  A  large  scale 
study  (1 1)  and  several  pilot  studies  (16,  19,  22,  28,  29,  30,  31,  44)  have  been  conducted 
related  to  this  topic. 

The  basic  hypothesis  is  that  sound  source  localization  requires  two  forms  of  information: 
1)  Information  about  the  auditory  spatial  cues,  and  2)  Information  about  the  location  of 
the  head.  That  is,  when  the  head  moves  the  auditory  spatial  cues  change  and  information 
about  the  position  of  the  head  is  required  so  that  a  stabilized  (veridical)  perception  of 
auditory  space  can  occur.  To  test  this  hypotheses  (see  1 1)  listeners  were  rotated  in  the 
chair  and  were  asked  to  make  sound  source  rotation  and  location  decisions.  They  did  so 
with  their  eyes  open  or  closed  to  control  visual  input  and  under  constant  velocity  or 
acceleration/deceleration  rotation  conditions  to  control  for  vestibular  input.  Since  the 
listeners  are  rotated  (as  opposed  to  moving  themselves)  at  a  slow  velocity,  there  are  no 
proprioceptive,  kinesthetic,  or  somatosensory  cues  related  to  rotation.  And,  no  prior 
experience  or  feedback  was  provided  to  the  listener  about  their  rotation,  so  there  was  no 
cognitive  information  based  on  experience  that  was  related  to  their  rotation. 

In  several  experiments  (1 1,  19,  20,  22,  23,  28,  29,  31,  32,  45)  the  following  results  were 
obtained  when  the  listeners  rotated  at  constant  velocity  with  their  eyes  closed;  thus 
depriving  them  of  all  information  about  the  position  of  their  head.  The  prediction  is  that 
in  this  case  sound  source  location  and  rotation  information  would  be  based  on  only 
auditory  spatial  cues  leading  to  all  spatial  perceptions  being  based  on  a  head-centric 
reference  system  (as  opposed  to  the  normal  world-centric  reference  system  used  to 
maintain  a  veridical  perception  of  auditory  space): 

A)  Stationary  sound  sources  were  perceived  as  rotating  in  a  direction  opposite  of  the 
listener’s  rotation. 

B)  When  the  sound  source  and  the  listener  rotated  at  the  same  velocity,  listeners  did 
not  perceive  the  sound  rotating. 

C)  When  the  sound  source  rotated  slower  than  the  listener  rotated,  listener’s 
perceived  the  direction  of  sound  rotation  as  opposite  that  of  the  actual  sound 
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rotation,  while  when  the  sound  rotated  at  a  velocity  faster  than  the  listener,  the 
perception  of  the  direction  of  sound  rotation  was  the  same  as  the  actual  rotation. 

D)  Listeners  were  at  chance  in  their  ability  to  locate  the  source  of  the  sound  in  a 
world-centric  reference  system  (i.e.,  at  chance  in  indicating  which  loudspeaker 
presented  a  sound),  but  could  with  a  little  practice  indicate  where  they  perceived 
the  source  relative  to  their  head  (i.e.,  in  a  head-centric  reference  system). 

All  of  these  outcomes  are  consistent  with  listeners  only  being  able  to  judge  the  head¬ 
centric  location  of  sound  since  they  were  deprived  of  any  information  about  the  position 
of  their  head.  Thus,  the  data  support  the  hypothesis  that  in  the  everyday  world  two  pieces 
of  information  (spatial  cues  and  head  position  cues)  are  required  to  locate  the  actual 
position  of  sound  sources  (e.g.,  to  operate  in  a  world-centric  reference  system). 

Experiments  are  underway  investigating  how  listener  and  sound  motion  affects  cones-of- 
confusion  errors  (e.g.,  front-back  errors  when  the  same  ITDs  and  ILDs  can  be  generated 
by  more  than  one  sound  source  location,  see  30).  Experiments  are  also  exploring  the 
extent  to  which  multiple  sound  sources  and  somatosensory  cues  can  help  listeners 
determine  head  position  and,  thus,  allow  them  to  localize  the  actual  position  of  sound 
sources  (i.e.,  in  a  world-centric  reference  system)  when  they  are  deprived  of  visual  and 
vestibular  infonnation. 
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Abstract 

The  AFOSR  Grant,  FA9550-1 2-1  -031 2,  has  supported  the  Spatial  Flearing  Laboratory  (SHL)  at  Arizona 
State  University  for  the  past  four  years.  The  research  conducted  in  the  SHL  has  involved  four  main  topics: 
Sound  Source  Localization  by  Cochlear  Implant  (Cl)  Patients,  Single  Sound  Source  Localization  Accuracy, 
Multiple  Sound  Source  Localization  Identification,  and  Sound  Source  Localization  When  Listeners  Move. 
The  Cl  research  was  also  supported  by  an  NIH  grant  ("Cochlear  Implant  Performance  in  Realistic  Listening 
Environments,"  Dr.  Michael  Dorman,  Principal  Investigator,  Dr.  William  Yost  unpaid  advisor.  The  other  three 
topics  cited  above  are  entirely  within  the  scope  of  the  AFOSR  grant. 

Sound  Source  Localization  by  Cochlear  Implant  (Cl)  Patients:  Several  studies  were  conducted  with  three 
patient  populations  (bilateral  Cl  patents,  Cl  patients  with  a  Cl  at  one  ear  and  a  hearing  aid  fit  to  other  ear, 
and  CT  patients  with  one  Cl  and  normal  or  unaided  near  normal  hearing  at  the  other  ear)  showing  that  Cl 
patients  who  receive  bilateral  input  can  in  most  cases  localize  the  source  of  sound  but  not  as  well  as 
normal  hearing  individuals.  When  Cl  patients  can  localize  sound  sources  they  usually  do  so  by  using 

interaural  level  differences  (ILDs)  and  not  interaural  time  differences  (ITDs). 
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Single  Sound  Source  Localization  Accuracy:  Several  experiments  were  conducted  using  a  sound  source 
localization  identification  task  to  determine  the  variable  that  determine  sound  source  localization  accuracy 
for  a  single  sound  source  in  the  front  azimuth  plane.  Stimulus  bandwidth  is  the  most  important  variable  with 
stimuli  having  bandwidths  greater  than  one  octave  yielding  the  best  accuracy,  and  accuracy  depends  on 
the  frequency  region  of  stimulation  for  bandwidths  less  than  an  octave.  Stimulus  duration,  overall  level,  and 
amplitude  envelope  have  a  very  small,  if  any,  effect  on  sound  source  localization  accuracy. 

Multiple  Sound  Source  Localization  Identification:  Two  independent  noise  sources  can  be  located  in  the 
frontal  azimuth  plane  but  not  as  well  as  one  noise  source.  A  scheme  in  which  the  combined  two-noise 
stimulus  is  analyzed  for  interaural  time  and  level  differences  in  small  temporal/spectral  cells  calculated  for 
the  combined  stimulus  was  shown  to  be  a  possible  way  in  which  two  or  more  simultaneous  sounds  could 
be  localized  at  different  sources.  Several  studies  indicate  that  the  maximum  number  of  spatially  separated 
and  simultaneously  presented  sound  sources  that  can  be  identified  and  localized  is  about  four  for  speech 
sounds  and  two-three  for  tonal  sounds.  Moving  one  or  more  sound  source  only  assists  in  segregating  one 
sound  source  from  other  sound  sources  when  the  sounds  are  not  perceptually  similar. 

Sound  Source  Localization  When  Listeners  Move:  The  results  from  several  experiments  support  the 
hypothesis  that  sound  source  localization  must  be  based  on  two  pieces  of  information:  information  about 
the  auditory  spatial  cues  and  information  about  the  position  of  the  listener's  head.  By  moving  (rotating) 
listeners  while  they  perform  sound  source  localization  tasks,  it  was  revealed  that  listeners  require 
information  about  the  position  of  their  heads  in  order  to  successfully  determine  the  veridical  location  of 
sources  in  the  actual  world. 
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